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ABSTRACT 



c 2 ^ ■ We describe a probabilistic method to cross-identify astrophysical sources in different catalogs from their positions, and provide the 

CNj ' probability that an object is associated with a source from another catalog or that it has no counterpart. We first consider the classical 

f-. , case of several-to-one associations, and then the more realistic but more difficult problem of one-to-one associations. 

Qj^- In either case, we compute the likelihood to observe the objects in the two catalogs at their registered positions, and build a 



00 



maximum likelihood estimator of the fraction of sources with a counterpart. When the positional uncertainty in one or both catalogs 
is unknown, this method may be used to derive its typical value and even to study its dependence on the size of objects; it may also 
On ■ be applied when the true centers of a source and of its counterpart at another wavelength do not coincide. 

I . To compute the likelihood and association probabilities in the various cases, a Fortran 95 code called Aspects ([aspe], 'Association 

positionnelle/probabiliste de catalogues de sources" in French) was developed; we make its source files freely available. To test 
!^ i' Aspects, we created all-sky mock catalogs containing up to 10^ objects. After analysis of these simulations, we are able to make some 

, recommendations on the choice of the source association model and of estimators of unknown parameters. 
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1. Introduction 

The problem of cross - identifying sources between two catalogs K and .g' has previously be en studie d by Condon et"al] ( Il975h . 
^ ■ IdeRuiteretaH (Il977h . iPrestage & Peacock! ([T983|) I Sutherlan d & Sau ndersI (fT992) and .Rutledge et all (l2000h . among others. As 

^" . .It,., c_ I J 



evidenced by recent papers of [BudavaiT &Szalavl (120081) and Pineau et al.l (|20T if, this field is still very active and will be more 
so with the wealth of forthcoming multiwavelength data. Usually, the association is performed using a "likelihood ratio": for two 
objects {Mi, M'j) e KxK' with known coordinates and positional uncertainties, and given the local surface density of K' -sources, this 
ratio is typically computed as lr :- ^(position | counterpart) /^(position | chance), where ^(position | counterpart) is the probability 
of finding M'j at some pos ition relative to M, if M' - is a counterpart of M,, and ^(position | chance) is the probability that M'j is there 
, by chance. As noticed bv [Sutherland & Saunder a (ll992h . there has been some confusion in the definition and interpretation of lr, 
jy-^ . and, more importantly, in the estimation of the probabilityQ that a source in K' is the counterpart of an object in K. 

When associating sources from catalogs at different wavelengths, some authors include in this likelihood ratio some a priori 
, information on the spectral energy distribution (SED) of the objects. As this work began, our primary goal was to build template 
observational SEDs from the optical to the far-infrared for different types of galaxi es. We initia lly intended to cross-identify the 
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I Iras Faint Source Survey (Moshiret al. 1992, 1993) with the Leda database (Pature fet al.lll995h . Because of the large positional 
. . . inaccuracy of Iras data, special care was needed to identify optical sources with infrared ones. While Iras data are by now quite 
^ ■ outdated and have been superseded by Spitzer observations, we still think that the procedure we developed at that time may be 
valuable for other studies. Because we aimed to fit synthetic SEDs to the template observational ones, we could not and did not 
want to make assumptions on the SED of sources based on their type, since this would have biased the procedure. We therefore rely 

■ in what follows only on positions to associate sources b etween catalogs. 

" ■ " The method we use is in essence similar to that of I Sutherland & SaundersI (1199^ . Because thinking in terms of probabilities 
rather than of likelihood ratios highlights some implicit assumptions, we found it however useful for the sake of clarity to detail 
hereafter our calculations; this allows us moreover to extend our work to a case not covered by papers cited above (see Sect.|4]i. 

We define our notations and explicit our general assumptions in Sect. |2l In Sect. [3] we compute the probability of association 
under the assumption that a /T-source has at most one counterpart in K' but that several /iT-sources may have the same counterpart 
("several-to-one" associations); we moreover determine the fraction of objects with a counterpart and, if unknown, estimate the 
positional uncertainty in one or both catalogs. In Sect. |4](and App. |B]i, we do the same calculations under the assumption that a 
/T-source has at most one counterpart in K' and that no other /T-source has the same counterpart ("one-to-one" associations). We 
explain in Sect. |5] how to implement these results in practice. We present in Sect. |6]a code. Aspects, with which we compute the 
likelihoods and probabilities of association in the various cases, and test it on simulations in Sect.|2l The probability distribution of 
the relative positions of associated sources is modeled in App. [A] 

* Available at www2 . iap . £r/users/ f ioc/Aspects/ . 

' For instance. |de Ruiter et alJ il977i') state that, if there is a counterpart, the closest object is always the right one, which is wrong. 
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2. Notations and general assumptions 

We consider two catalogs K and K' defined on a common surface of the sky, of area S, and use the following notations: 

- #E: number of elements of any set E; 

- Ml, ... , M„, with n :- #K: sources in K; 

- M[, M'„,, with «' :- #K': sources in K'. 

We define the following events: 

- c,: Mi is in the infinitesimal surface element d^r,- located at r,-; 

- c'j'. M'j is in the surface element d^r'j located at r'j; 

- C :- n"=i c,: the coordinates of all TiT-sources are known; 

- C :- n"=i c'j'. the coordinates of all /T'-sources are known; 

- Aij, with j > 0: M'j is a counterpart of M,-; 

- Ai^o: Mj has no counterpart in K', i.e. A, o = Upo^ij^ where 73 denotes the negation of an event cj; 

- Ao f M'j has no counterpart in K. 

We also write / the a priori probability PiUpoAtj) that an element of K has a counterpart in K' (so, P(A;.()) = 1 - /); we will 
see in Sects. [T2] and l4.2l how to estimate /. We moreover assume that any M, has at most one counterpart in K': Aij n A, ,t = if 

Clustering is neglected in all the paper. 

3. Several-to-one associations 

In this section, we do not make any assumption on the number of /iT-sources that may be the counterpart of a given /T'-source: this 
is a reasonable hypothesis if the angular resolution in K' (e.g. Iras) is much poorer than in K (e.g. Leda), since, in that case, several 
distinct objects of K may be confused in K'. As evidenced by Sect. 13.31 this is also the assumption implicitly made by most of the 
authors cited in the introduction. We call this the "several-to-one" case (subscript "s:o" hereafter). 



3.1. Probability of association: all-sky computation 

We want to comput^^ the probability Ps:o(^!. ; I C n C) of association between sources M,- and M'j ( j > 0) or the probability that M,- 
has no counterpart (j - 0), knowing the coordinates of all the objects in K and K'. Remembering that, for any events coi, a)2 and 
cl)3, P(a)i I a>2) - P(m>\ n cl>2)/P{cl>2) and P(a>i n (l)2 \ cl>3) - P{(jl>\ \ cai H 1^3) P{ci)2 I ^^3), we have 

Ps o(A,- ,• n C n C) P,-Mi / n C I C) 

Ps o(A; ; I C n = ' = ■ -. (1) 

Ps:o(CnC') P.AC\C') 
We first compute Ps:o(C | C). Using the symbol l+J for mutually exclusive events instead of (J, we obtain 

n' n' n' n n' n' n' n 

p^c I c) = p,o(c n y y ■ ■ ■ y q a,,,, | c) = ^ ^ ■ • ■ ^ p....{c n q a,,,, | c) 

;i=0j2=0 j„=Oi=l yi=0j2=0 j„=0 k=l 

n' n' n' n n 

= Z Z ■ ■ • Z ^-(^ I n^*-^' ^ A,.,, I C). (2) 

>i=0i2=0 j„=Q A'=l k=\ 

One has 

n n n n n 

P,..o{c I f]Atj, n C) = Ps:o(ci I ^ O^kj, n C')Ps:o{f]ck \ Qa,,,, h C) 

k=l k=2 k=l k=2 k=l 

n 11 n 

= n p.:o{ce I n ^ n ^ 



t=\ k=t+\ k=\ 



by iteration. 

If ji + 0, since M( is associated with M'^^ only. 



P..\ct I n Q n Q Aa, n C) = Ps:o(cf I At, j, n c;. ) =: ^i, j, dW (4) 



*=f+l k=l 



^ For the sake of clarity, let us mention that we adopt the same decreasing order of precedence for operators as in Mathematica jWolframl 1 9961) : 
X and /; O; 2; + and -. 
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where (cf. App.lAli 

27r(detrf,jv)i/2 

rij^ :- r'j^ - ri and Tcj^ is the covariance matrix of rij^. (Note that, in the several-to-one case considered here, the computation of 
fs:o(C I C') is easier than that of /'s:o(C | C): because several Mc may be associated with the same M[, the latter would require to 
calculate Ps:o(c^ I C\'c=\- ji=k ^ ^c, iJ)- This does not matter in the one-to-one case studied in Sect. ID) 
If i( - 0, since Mc is not associated with any source in K' and clustering is neglected, 

n n' n 

P,..o{cc I Q Q n n f]Akj^) = P,,,(ce I Ae,Q) =: ^e,o(fre, (5) 

k=e+l k=l k=l 

where o - l/S for a uniform distribution of /T-sources without counterpart as prior. 
From Eqs. Q, (|4| and (|5]), it follows that 

n n 

P...{c\^Au,j,C^C')^AW^u,j,, (6) 

k=\ k=\ 

where A := 17"= i '^^''k- 

We now compute P^AfTk^i A^j^ \ C). Without any other assumption, PyACl'Li Akjt I C) - Ps:o(nLi ^*,>i)- Let m :- #{jk > 0; 
k e |[l,n]]). Since a given may be the counterpart of several M^: (i.e. the events {AkJ^)kel\,nl are independent whatever the values 
of indices jk), 

n n 

P..o[i^Akj,)^WP,AAkj,). 

k=i k=l 

As PyJAk,o) = 1 - / and P.-AAkj,) = //«' for jk > 0, 
Hence, from Eqs. (O, (|6j and Q, 

n' n' n' I J. \m n n' n' n' n n n' 

P..AC I c) = ^ X z ■ ■ ■ z ^ - n = z z ■ ■ ■ z n = n z 

ii=0./2=0 ;„=0^ ^ k=\ j,=Oj.M^ j„=Ok=l t=l A=0 

where Zfi.o := (1 - /)^i,o and ^ := f^kjjn' if > 0. 

The computation of PsJAij nC \C') is similar to that of P^.oiC \ C): 



n' n' n' n 



PMj n CI C) = P.:o{cnA,j n y • ■ • y \+j ■■■{+) (~]Akj, \ C) = P,o(c n [+]■■• [+] \+j ■■■{+) (~]Akj, \ C) 

ji={) j,_i=0,/,+i=0 ,/„={) *=! ji={) y,-i=0./,+i=0 ,/„=0*=l 

= z ■ • ■ z z • ■ • z I n^*.A n c')p4f]Ak.j, I c), (9) 

,/,=0 ;;-i=Oy,+i=0 y„=0 *=! i=I 

where we have put jj := j. 

Let OT* := #{jA: > 0;^ e |[l,n]]) (indices jk are now those of Eq. (|9]l). As for Ps:o(C I C), 



p..(A,,,ncic')=.x-2 z-z ^ a-/r"'"n^^--'^^u,z-z z-zn^^- 

>i=0 ;,-i=OX+,=0 ;„=0^ I k=l 71=0 ;;-i=0;,+,=0 ;„=Oi=l 

ki=i 



ki=i 

Finally, from Eqs. flj, © and ([TOll, 



n"=l I^1-={)^*.A >-. 

Ps o(A/ / 1 C n C) = = (11) 

11A:=1 2j)(.=0t*,./l 2jk=0bi,k 



^^'■•^ ifj>0. 



(l-/)«'^,,o + /i:ti^a 
(l-/)«^^,,o 

Ul-/)n'^/,o + /2:ti^a 



(12) 



if; = 0. 
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The probability Ps:o(^o,; I C n C) that M'j has no counterpart in K can be computed in this way: 



n' n' n' n n' n' n' n 



PMj n CI C) = P.:o{cnAoj n[+j{+) ■■■{+) f]A,,j, \ C) = P,..,{c n {+j {+)■■■ {+) Q A,.,, | C) 

;i=0;2=0 j„=Oi=l >i=0j2=0 j„=Ok=l 

n' n' n' n n' if n' n n n' 

ji =072=0 i„=Q k=l ,/i=0,/2=0 j,i=0 k=l k=l ji=0 



and 



h*i 

Ps:o(C I C) '^Wk=ll^\=oCk,h k=l 2'J,=of*,A k=A 2"i-=of*,A 



i^s:o\Aqj I c n c ; - 



V"' /, i i I V"' 



^Y[{l-P,■AAkJ\Cc^C']). (13) 

3.2. Fraction of sources with a counterpart and ottier unknown parameters 
3.2.1. Maximum likelihood estimates 

Besides /, the probabilities PiAij \ C n C) may depend on other unknown parameters, e.g. & and v (cf. App.lAt. We write here 
xi, X2, etc., all these parameters, and put x := (x[,X2, . . .). An estimate ^ of a: may be obtained by maximizing the likelihood L to 
observe all the K- and ^T'-sources at their effective positions with respect to x (and with the constraint / e [0, 1]). 
This likelihood is defined by 

P(CnC')=:(ndV,.)(ndV;.)L, 

i=l j=l 

and P(C n C) = P(C \ C) P(C'). Since P(C') = U%i Pic'j) and P{c'j) -. ^o,> dV}, where ^o,> = 1 /5 for a uniform distribution of 
/T'-sources, 



P(C_|_C') 

T 



As the ^oj are independent from jc, one just has to solve 



to determine x. 

In the several-to-one case. 



Therefore, for any parameter Xp, 



dim 1 dpjcic) 

dx P(C I C) dx 



is:o = (nZ^''*)n^O'> (15) 

i=I *=0 ;=I 



"■'^ i=i ,=1 j=o Zji=o^a /=! j=o hk=oii, 

n n' 



5 In 4; , 

-T,T,^^P-o(^'j\CnC'). (16) 



Let us consider in particular the case Xj, - f. Note that dln^i o/df - -1/(1 - f) and dln^ij/df = 1// for / > 0. Since 

i:"lo^s:o(A,„|CnC')= 1, 

V d\n(,j ^ PM,o\CnC') ^ PMjlCnC) _ PM^oicnC) l-PM,o\CnC') 

y=0 ■' J=l 

^ (l-/)-fs:o(A,,o|CnCO 
/(I-/) 
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Summing on /, we obtain 

glnL,:o ^ n(l-f)-ZUPs:o(Ai,o\CnC') 
df f(l-f) 
So, an estimate of the fraction / of K-somces with a counterpart in K' is given by 



1 " 

/s:o=l--y4o(A,-,o|CnC'), 

n ^ 

1=1 

Y n n' 



After some tedious calculations, one shows that 

dHnU,, _ Z'U ([!-/] - fi:o[A,-,o I C n C']f 



(18) 

(19) 
(20) 

<0 (21) 



for all /, so d In Lyo/df has at most one zero in [0, 1]: /s:o is unique. 

An estimate of the fraction /' of /^'-sources with a counterpart may also be computed from 

1 "' 

= 1 - - y A:o(Ao,j I C n C). (22) 
n '—f 

;=i 

One can easily check from Eqs. ( l20b . ( l22l i and ( fT3T l that f^.^jn' > fl^/n in the several-to-one case. 
3.2.2. Uncertainties 

Uncertainties on the unknown parameters may be computed from the covariance matrix V of x. For large numbers of sources, V is 
asymptotically given by 

(iKendall & StuartifTOTl . 

Let us write with a circumflex accent all the quantities calculated atx=x. From 

d^lnL _ 1 d^PjC I C) _ 1 dPjC \ C) dP(C \ C) 
dxp dxq P(C I C) dxp dxq P-(C \ C) dxp dxq 

one obtains 

^^InL 1 d^P{C\C') 



dXp dXq P(C I C) dXp dXq 

One has 

d^PUC I C) ^ j„ ^. In (i, J dPs-JAi, jnC\ C) 



(24) 



/-jj-JdXpdXq ■ ■ /-Jj-J dXp dxq 



For any product of strictly positive functions gk of some variable y. 



dy tt dy II ^ By 

k*i 



SO, using Eq. ( fTOl l. 

d'x " ^ 5^ 1 2^ +^^''jL d'x 1 1 



.Q 1 r n n' ^ n n' ^ n n' 

' ^ n z f... - ^ ^ z z ^f'.- n z 



■ Ps:o(A,-,; n C| C) + Ps:o(A/, y | C H C) ^ ^ '''' A:o(Af,,, H C | C). (27) 
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For X - X, 

X Z ^^-(^^'^^ ^ C I C ) = Z Z ^^-^^^-^v n C I C) - 2 PMj, n C I C) 



-^^PMj^nClO (28) 

j,=0 "'*^9 



since the first term on the right-hand side of the first line is zero from Eq. ( fTSI ). Finally, combining Eqs. dZSl l, ( |27] |. ( |28] ) and dividing 
by Ps:o(C I C), we obtain 

dXp dXq <JXp OXq dx,, dXq I 

" Zj Z I C n C] 2] I C n C). (29) 

3.3. Probability of association: locai computation 

In the several-to-one case, a purely local computation (subscript "loc" hereafter) of the probability of association between a given 
Mi and some Mj (j > 0), or of the probability that M, has no counterpart in K', is also possible. 

Let us consider a region D, of area 5, containing the position of M,, and such that we can safely hypothesize that the K'- 
counterpart of M,-, if any, will be inside. We assume that the local surface density pj of /T'-sources unrelated to M, is uniform on D,. 
To avoid biasing the estimate if M, has a counterpart, pj may be computed from the number of /T' -sources in a region surrounding, 
but not overlapping, D,. 

Besides the Ajj, we consider the following events: 

- A^,': Di contains «J sources; 

- C; := Cljei, c'j, where /,■ := {; | M'j E D,). 

We want to compute the probability that a source M'j in D, is the counterpart of M,, given the positions of the neighbors, i.e. 
PicciAij I C,'nA?,'). Wehave 

Pioc(A; ,■ n C'i n M) Pioc(c; n A; ,• n M) 



floc(C,' n n;) Pioc(q n lij^emo) Ai,k n n;) 
Piociq n Aij n n;) Pi^AC'i \ a,-,,- n n;) floc(A,-,; n n;) 



Ste/.uio) floc(C,' n Aa n A?/) i:te/,,u(oi floc(Q I Aa n A^;) PiodAu n A^/) 
^ fioc(c; |A,-.,nA^;)Pio,(A,;,|A^;) 

floc(C,' |AanA?,Ofloe(A,-i |A?,') 

If y > 0, floc(A,j I A^/) = PiociUkeii I (^i^^ ^^^^ ^ere why event A^/ was introduced: otherwise, floc(A,; j) could not be 

computed as flocdJte/, A/,A:)/nJ because nj would be undefined). Now, 

u (\ \. floc(A^/nUte/,Aa) floc(A^;iUte/,Aa)floc(Ute/, Aa) 



floc(A^/) floc(A^/ I A,-,o) Ploc(Au)) + Ploc(A^; I Ute/, A/.t) PlocdJte/, A/.t) 



If clustering is negligible, the number of sources randomly distributed with a mean surface density p\ in an area 5, follows a 
Poissonian distribution, so 



D Iaj'W \a \ (P,' 5/)"' exp(-p; 5,) , . . 

P\oc\Ni A,;j; = — — — (n,- - 1 random sources in Si) 



and 



D I 4 ^ (p; 5,)"' exp(-p; 5,) ., . . „, 

PiociNi I A,;o) = («,■ random sources in Si). 



Thus, 

/ 



PioMij I A?;) 



«;/ + (! -/)p;5/ 

(l-f)p'iSi 
[n'if + (l-f)p'iSi 



if;>0, 

if; = o. 
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For j > 0, 



(rigorously, ^ij should be replaced by ^ij/PiociMj 6 | Aij), but PiodM'j i D/ \ Aij) is negligible), and 

floc(c;iA,-,onA^;) = f]^. 



keli 



Finally, 



floc(A,-,,ic;nA?;) = 



(1 -/) + /2te/,LRa 

(1-/) 



if;>0. 



(30) 



V (1 -/) + /2te/, LRa 

where hRi k :- ^i,k/p'i is the "likelihood ratio". Mutatis mutandis, one obtains the same result as Eq. (14) of iPineau et a 
and aforementioned authors. When extended to the all sky (i.e. 5, S), p] is replaced by n'jS in Eq. (l30t . Yikeh by YIk=\ 
recovers Eq. (fT2l l since ^,;o = 1/5 for a uniform distribution. 

The index /,■ of the most likely counterpart M^- of M, is the value of /' > maximizing lr,- j. Usually, Yj'k=i-ki=] ^ '^^'.7;' 



i's:o(A,-,,- ICnC) 



/lR/j, 



(l-/)+/LR,j, 



As a "poor man's" recipe, if the value of / is unknown and not too close to either or 1, an association may be considered as true if 
LR, J- » 1 and as false if lr, «: 1. Where to set the boundary between true associations and false ones is somewhat arbitrary. For 
a large sample, however, / can be determined from the distribution of the positions of all sources, as shown in Sect. 13.21 



4. One-to-one associations 

In Sect. [3] a given M'j may be associated with several M,: the probabilities are actually asymmetric in the M, and M'j , and, while 
Z'jLo ^s:o(A/, J I C n C) = 1 for all M,-, one may well have Ps-dAij \CnC')> I for some sources M'j. 

Here, we assume not only that each /T-source is associated with at most one /T'-source, but that each A''-source is associated 
with at most one /T-source. We call this the "one-to-one" case and note with a subscript "o:o" all the quantities calculated under this 
assumption. As far as we know and despite some attempt by Rutledge et alJ {2000), this problem has not been solved previously. 

Since a /T'-potential counterpart of M, within some neighborhood D, of M, might in fact be the true counterpart of another 
source M^ outside of D,, there is no obvious way to extend the exact local several-to-one computation of Sect. l3.3l to the one-to-one 
case. We therefore have to consider either the whole sky, as in Sect. 13.11 or at least some large enough region around both M, and 
M'j to neglect side effects. 

In the case of one-to-one associations, a source of K and a source of K' play symmetrical roles; in particular, Po.oiAij) = f/n' = 
f'/n. For practical reasons (cf. Eq. (l35b). we name K the catalog with the fewer objects and K' the other one, so « < «' in the 
following (case n> n' will be considered in App.lBt. 



4. 1 . Probability of association 

We want to compute PoAAij \ C nC) for / > 0. We still have 



, Po-o(A/ ;■ n C C) 

PooiAi ; I CnC) = ■ — - (31) 

^ •■' ' ' P^-oiC I C) 



and 

n' n' n' n 



PUC I C) = Po:o(c n U U ■ ■ ■ U (~]Akj, I C). 

ji=0j2=0 j„=Ok=l 



As Aij n Akj = if i k and j - ( > 0, this reduces to 



n n n n 



Po-Ac I c) = Po:o(c n y y ■ • ■ y q a^,,, | c). 



,/i=0 j2=Q ,/„=0 *■=! 
jitJojiiJ] j,itJ„-i 
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where Jo '■- and is defined iteratively for all k e [1, nj by 7^ := { Jk-i U {yi}) \ {0}. Hence, 



II II II It 



71=0 ;,=0 ;„=0 k=l 

jiiJo jiiJi j„iJ„-\ 



II II II II II 

Z Z ■ ■ ■ Z ^-°(^ I n^^-A ^ C')Po:o(Q A,,,, I C). (32) 



71=0 ;,=0 ;„=0 k=\ k=\ 

i\<tJo hth j„iJn-\ 



As in the several-to-one case. 



Po.o{c I n Ak, j, n C) = i f] . (33) 

/t=l j(:=l 

We now have to compute -Po:o(n"=i ^'^-A I ~ ^o:o(n/!=i a)- ^'^'^ X be a random variable describing the 

number of associations between K and K': 

n n n 

Po:o[{^Ak,j) = Po:o[{~]Ak,j, | X = m) P^X = m) + fo:o(n Ai,y, | X + m) P^X + m). 

k=l k=\ k=l 

Since -Po:o(ni=i Akj^ \ X + m) - 0, one just has to compute Po-.oiClk^i ^kjt I X = to) and Po.oiX - to). 

There are n\/{ml [n - to]!) choices of to elements among n in /T, and n'\/(m\ [n' - to]!) of to elements among n' in K'. The 
number of permutations of to elements is to!, so the total number of one-to-one associations of to elements from K to to elements of 
K' is 

n! n'! 

to! . 

to! (n - to)! to! (n' - to)! 

The inverse of this number is 

n" I \ m\ (n — m)\ (n' — m)\ 

<:=1 ■ ■ 

With our definition of K and K', n < n', so all the elements of K may have a counterpart in A"' jointly. Therefore, PoAX - to) is 
given by the binomial law: 

Po:o(X = TO) = ^, r (1 - (35) 

m\ (n - to)! 



From Eqs. and we obtain 



ft.(c I c) = . 2 z ■ z ^ /■■ (> - /)"- n z z z n 

;i=0 >2=0 >„=0 ■ i:=l j,=0 7,=0 7„=0 A=l 

jxihhih ji,iJn-\ jiiJo jitJi in<tJ„-\ 

where 77^.0 := ft,o and t;^,;-,. := f^kjjin' - #Jk-\) if A > 0. 

Po:o{Aij n C I C) is computed in the same way as Po.oiC \ C): 



ff n' n' n 



n' tr 



PM, n C| C) = i^o:o(c n A,-, n y ■ ■ ■ [+j [+j ■■■ [+j f] Ak, j, \ C) 



j,=0 j,-i=0 ;,+,=0 ;„=0 k=l 



It II II It It 

=Po:o{cn y •■■ y {+)■■■{+) C]Ak,j,\c'), 

ii={) i,-i=0 7,+i=0 h=0 k=l 



where := 7* := {;} \ {0} and j; := (y*_i U {;,}) \ {0} for all k e [[1,«]], so 



II It It II II It 

Po:o(A,; n CI C) = 2 ■ ■ • 2 Z ■ • • Z ^-°(^ I n^^->' ^ C')P,..o{f]Akj, I C). 

j,=0 ;,_,=0 ;,+i=0 >„=() i=l k=l 
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Let m* := #/,*. As for Po:o(C | C), 

ji=o ;,_,={) ,/,t,=o >„=o • k=l 

n' n' n' n' n 

j,=0 j,_i=0 ;;+i=0 ./„=0 /;=! 

where 77* ,, := and := f^kjj(n' - #Jl_^) if > 0. 
Finally, from Eqs. ^ and (O, 

/ V V V"' V TT " rj* 

PM . I c n = — , — '^^^^^ — . (38) 

The probability that a source M'j has no counterpart in K is simply given by 

n 

Pc:o(Aoj I C n C) = 1 - 2 Po-oiAkj I C n C). 



4.2. Fraction of sources with a counterpart and otiier unknown parameters 
4.2.1. Maximum likelihood estimates 

As in the several-to-one case, an estimate jCo:o of the set x of unknown parameters may be obtained by solving Eq. ( fT4b (with the 
constraint /o:o e [0, «/«']). In the one-to-one case, one obtains from Eq. ( l36l l that 

z--- z n'?^,A-)r]^o> (39) 

ji=0 j2=Q j„=Q k=l k=l 

As the number of terms in Eq. ( l39b grows exponentially with n and n', this equation seems useless (see nonetheless Sect. l5.2.2l for a 
practical computation of Lq-o). In fact, the prior computation of Lq-o is not necessary if the probabilities Po.oiAij \ C n C) are known 
(we will see how to approximate these in Sect. l5.2.TT l. 

Indeed, for any parameter Xp, let us show that we get the same result (Eq. ( fTSb ) as in the several-to-one case. Using Eq. (l26b . we 
obtain 

"^P 71=0 72=0 ;„=0 ,= 1 "^P k=\ 

j\(Ja h<tJ\ j„<tJi,-\ 

The expression of Po-.oiAij n C \ C) may also be written 

n' n' n' n 

puAij n c I c) = ^ z z ■ ■ ■ z ^^j' = n 

ii=0 j:=0 j„=0 *:=1 
j\thhih i„tl,-\ 

where 1 is the indicator function (i.e. l(j, = j) = 1 if proposition "y, = j" is true and l(j, = j) = otherwise), so 

,=1 j=o """P i=i j,=o n=() j„=o j=o "^P k=i 

j\iJa i2<tJ\ h<tJ„-\ 



n n n 



^ZZZ-Z^fl'w. (41) 



1=1 ,/i=0 ,/2=0 ,/„=0 k=\ 
iiiJo hiJi j„tJ„-] 

If ji - 0, then riij. = ^ij.; and if y, > 0, the numerators of rjij. and ^, ^are the same and their denominators do not depend on Xp: 
in all cases, d\nriij./dxp - dln^ij./dxp. The right-hand sides of Eqs. j40l l and ( |4TI ) are therefore identical. Dividing their left-hand 
sides by Po.oiC \ C), one obtains again 



9 



M. Fioc: Probabilistic positional association of catalogs of astrophysical sources: the Aspects code 



For Xp - f, one still has dln(i o/df - -1/(1 - /) and din dj/df - 1 // if j > 0, so, as in the several-to-one case, 

dlnL,,, ^ n(l-f)-Y.UPUAi,o\CnC') 
df f(l-f) 



4.2.2. Uncertainties 

Regarding uncertainties on the Xp, Eqs. ( l23l l. ( l24l l and dZST l are valid in the one-to-one case too, so, from Eq. (l42b . 

ginz^ ^yf3^ p^,^^^^ ^, c n c V y y ^ ^^-^^'V ' ^ " 

Contrary to the several-to-one case, no simple exact analytic expression of the terms dPo.oiAij \ C n C')ldxq could be obtained. 
These terms might be computed numerically, but it is simpler to determine the second derivatives of Lo:o directly from Eq. (l47l l 
using finite differences. 



5. Practical implementation 

5. 1 . Several-to-one case 

5.1 .1 . A matter of neigliborliood. . . 

In the several-to-one case, the computation of the probability of association Ps:o(^i, j I C n C) between M,- and M'j from Eq. (fTTl i 
is straightforward if / and the positional uncertainties are known. However, the number of calculations for the whole sample or for 
the determination of x is of the order of « n'. 

As ^, It rapidly tends to when the angular distance r,; ^ between M,- and increases, there is no need to sum from A: = 1 to n' 
in Eq. ( fTTT l. nor to compute explicitly all the Py.oiAij \ C n C). If R is some angular distance above which ^i^^ «; ^, 0, one may set 
^i,k to (and P^AAi.k) too) if r,_i > R and replace the sums YJlLi by Y,"kLur; ,.<«■ 

In fact, for most M,, one does not even need to test whether r,;i < R for each e K'. Let us write £, the domain of right 
ascensions a' out of which no point M' of declination 6' and closer to M, than distance R may be found. The angular distance 
between M' and M, is given (cf. Eq. (lA.lb ) by 



cos i// = cos(ck' - Q-,) cos 6i cos 6' H- sin 6/ sin 6'. 



If Si e [-7r/2 -t- 7?, 7r/2 - 7?], the minimum of cos(a'' - a,) under the constraint cos if/ > cos R is reached when sin 5' - sin 5,7cos 7? 

and 

, -\/cos^"R^^^sin^^ 
cos(c!' - a,) = . 

COSO/ 



Let A,- := arccos^-y/cos^ R - sin^ 5,7cos 5,). The domain is given by 



' [0, a,- -H A; - 2 tt] U [a,- - A,-, 2 n] if a/ -H A/ > 2 n, 
Ei = < [0, Ui + A,] U [or; - A; H- 2 TT, 2 n] if a,- - A; < 0, 
[ff,- - A;, or,- H- A,] Otherwise. 

If 5; e [-7r/2, -7r/2 -H i?] U [;r/2 - 7;,7r/2], one has = [0,27r]. 

For a catalog K' ordered by increasing right ascension (if not, this is the first thing to do), one may easily find the subset of 
indices k for which a'^. e Ej. For instance, if £, - [a, - A,-, a,- H- A,], one just has to find by dichotomy the indices fe" and such that 
a^-_i < a; - A; < a'l^- and a[t < a,- -H A/ < a^+^i. The sums Yj"k=\ r <« '■'^^'^ ^e replaced by Y!k=k--r <r- 

In all cases, the sum may be further restricted to sources with a declination 6'k e [5/ - R, Si -\- R]f] [-n/l, n/l]. 



5.1 .2. Fraction of sources witli a counterpart 

All the probabilities depend on / and, possibly, other unknown parameters like & and v. Estimates of these parameters may be found 
by solving Eq. (fT4l i using Eq. ( fTSI l. 

If the fraction of sources with a counterpart is the only unknown, the ^tj need to be computed only once and /s o may be easily 
determined from Eq. (1% . Denote g the function 

^:[0,1]^R, (44) 
1 " 

/ ^ 1 - - y P,:o(A,-,o I C n C), (45) 

i=\ 

and let us show that, for any fit e ]0, 1 [, the sequence (fk)k£N defined by fk+i '■- gifk) tends to f^-o- 
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First, note that 

n df 

The only fixed points of g are hence 0, 1 and /s o. As d^lnLyo/df^ < (Eq. (12111). one has din Lyo/df > and thus g(f) ^ / for 
/ € [0,/s:o]; similarly, dlnL^o/df < and g{ f) < / for / E [/:o, 1]. Because Ps:o(^/,o I C n C) decreases for all / when / increases, 
g is also an increasing function. 

Consider the case /o € ]0,/:oL If fk < /s:o, gifk) > fk and g(fk) < g(/:o) = /:o- As g{ fk) = /t+i, (fk)keK is an increasing 
sequence bounded from above by /:□: it converges therefore in [/o,/s:o]- Because g is continuous and / □ is the only fixed point in 
this interval, {fk)ken tends to 

Similarly, if /o e [/:□, 1[, {fk)km is a decreasing sequence converging to /:□. 

Because of Eq. (l43T l, and although it is not obvious that Po-oiA^) \ C n C) decreases for all / when / increases, nor that 
d^lnLo-o/df^ < 0, this procedure also works in practice in the one-to-one case, with Ps:o replaced by Pq-o in Eq. (|44] |. A good 
starting value /o may be /s:o. 



5.2. One-to-one case 

5.2.1 . Computation of the probabilities of association 

All what was said for the several-to-one case in Sect. lS.l.Tl still holds in the one-to-one case. Because of the combinatorial explosion 
of the number of terms in Eq. ( |38] ). an exact computation of Po.oiAjj \ C n C) seems hopeless. A sequence of approximations 
converging to the true value may however be built in the following way. 

For any M,, let be a permutation on K ordering the elements M^(i), M0(2), . . . , M^(„) by increasing angular distance to M,-. For 
j - or M'j in the neighborhood of M,-, and for any { 6 [1, «]|, define 

Pe := — — ^ —, (46) 

2 ;,=o 2 j2=o ■ ■ ■ 2 jf=o Y\k=\ ''Ik, it 

where J*'" := {;) \ {0}, jf* := (7^:; U {jk]) \ {0} for all k e |[2,nl, := Jk for all k. 

As i;*(l) = /, Pi = PsAAij I C n C) (cf. Eq. (fTTl i): at first order, we obtain the same result as in the several-to-one case. Since, 
if Ml and My are close to each other, the influence of other /^-sources on the result decreases very fast with their angular distance to 
Mi and Mj, Pc rapidly converges to P„ - Po.oAij \ C n C). 

Because of the recursive sums in Eq. ( l46b . the computation must in practice be further restricted to sources in the neighbor- 
hood of Mi and M'j, as explained in Sect. 15.1.11 

5.2.2. Computation of tine likeliliood 

Consider Eq. ([SB for / = 1 and j = 0. One has P^JC \ C) = Po:o(Ai.o n C \ C')/Po:o(Ai,o \CnC') and 



h 
h 

since J\ - J\ - & for ; — 1 and j - 0. Similarly, 



72=0 j„=0 k=\ k=2 



k=2 

and 



P \r'nA \ ^o:o(A2,onnL2CdC'nAi,o) 



^"0:0(^2,0 1 nL2c^nC'nAi,o) 

n 2 



i=2 4=3 t=l 

By iteration, one obtains 

^/.odV,- 

^=1^ ^o:o(A,-,o I [HLi Ck] n c n nr=\ A,,o) ■ 



^'o:o(C I C) = n 
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As Po:o(A,-,0 I [RL-Q] n C' n nr='i A,,o) = ^o:o(A,-,0 | C H C H nr='i Ai,o), 

^ (n Po:JA,„|CnC'nni;\A,.o])n^°-- ^^'^ 

This formula is actually also valid in the several-to-one case (with "0:0" replaced by "s:o"), but much less convenient than Eq. ( fTSl l. 

The probabilities Po:o(^!,o | C n C n Q^^'j Ak o) can be computed in the same way as the PoAA-ij \ C Pi C) in Sect. l5.2.TI one 
can simply apply Eq. ( l46l l with //^'^ = rj^ - for jk > and (p(k) < i. 



6. The Aspects code 

To implement these results and test the efficiency and consistency of the formulae established here, we have built a Fortran 95 code, 
Aspects, a French acronym (pronounced [aspe] in International Phonetic Alphabet, not [aespekts]) for "Association positionnelle/ 
probabiliste de catalogues de sources", or "probabilistic positional association of catalogs of sources" in English. The source files 
of this code are freely availabl^^at www2 . iap. £r/users/fioc/Aspects/. 

Given two catalogs of sources with their positions and the uncertainties on these. Aspects computes, in the several-to-one, one-to- 
one and one-to-several cases, the likelihood L, the maximum likelihood estimates of / and /', and the probabilities PiAij \ C n C). 

The code compiles with Ifort and Gfortran. As computations in the one-to-one case are complex (they involve recursive 
sums for instance), we made several consistency checks. In particular, we swapped K and K' for n + «', and compared quantities 
resulting from this swap (superscript "<->" hereafter) to original ones: we got f'^ - fo-o and, for = /, L^.^ - Lq-o and 
P^Mj.i I C n C) = PoAAij \Cr\C') for all (M„ M'j). We also numerically checked Eq. For small n and «' (< 5), we verified 
that Aspects returns the same numerical values as Mathematica (Wolfram 1996); and for even smaller values of n and n' (< 3), we 
made sure that manual analytical expressions, obtained from the enumeration of all possible associations between K and K', are 
identical to Mathematica's symbolic calculations. 



7. Simulations 

7.1. Estimation of f if positional uncertainties are known 

We have built all-sky mock catalogs in the cases of several- and one-to-one associations. To do this, we first selected the indices 
of /n objects in K, and associated randomly to each of them the index of a counterpart in K'; for one-to-one simulations, a given 
/T'-source was associated at most once. We then drew the true positions of ^''-sources uniformly on the sky. The true positions of 
/T-sources without counterpart were also drawn in the same way; for sources with a counterpart, we took the true position of their 
counterpart. The observed positions of K- and /iT' -sources were finally computed with a, - bi - cr (see notations in App.lAt for all 
Mi e K and a'j - b'j - cr' for all M'^ e K': positional uncertainty ellipses are therefore circular. 

Only two parameters matter in that case: / and cr := (cr^ + cr'^)'^^. Hundreds of simulations were run for f - 1/2, n' = 10^, 
& - 10"^ rad and n e [10^, lO^J. We analyzed them with Aspects in the case of known positional uncertainties, and plot the mean 
value of four different estimators of / as a function of n in Fig.[T](eiTor bars are smaller than the size of the points): 

- /s:o, the value maximizing Ls:o (Eq. (fT9]l); 

- /o:o, the value maximizing Lo:o (Eq. (l43Tl); 

- /o:s, the value of / in the one-to-several ("o:s") case, i.e. when several /T'-sources may be associated to a unique /^-source. 
Estimator /o:s is given by Eq. (|22]) . with K and K' swapped: 



where (cf. Eq.(fT3b) 



1 " 

/o:s = l--y/o:s(A,,o|CnC'), 

n ■f-r' 

1=1 



Po:,(A,-,o \C^C')^\\{\- P^-AAui I C n C']) 



and, for j + 'd (cf. Eq. (El), 



Po:s(A/,,|CnC') 



(i-/')«^o,;+/'2:Li^*,; 
(i-.r)«^o./ 

Ul-/')n^o.; + /'2:Li^*,y 



if i > 0, 

(48) 

if / = 0. 



3 



It uses some Numerical Recipes Fortran 90 routines jPress et anil992h for sorting arrays, locating a value in an ordered table and generating 
uniform and Gaussian random deviates. Because of license constraints, we caimot provide these routines, but they may (and will in a future version 
of the code) be replaced by free equivalents. 
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(a) (b) 



-e^ © e e g 



O O / = /s:o 

* f - fo:o 

• ■ / = /o:s 

----- / = ^s 

IDOO 



/ = /s:o 
/ = /o:o 
• f - fo.s 



Fig. 1. Mean value of different estimators / of / as a function of n for f - 1/2 (dotted line), n' - 10 , & - 10 rad and 



circular positional uncertainty ellipses (see Sect. l7.1l for details on the simulations), (a) Several-to-one simulations, (b) One-to-one 
simulations (note that f^-o and f*.^ overlap). 



In these expressions, /' is replaced by the value /g'., maximizing the one-to-several likelihood (cf. Eq. (fTSTi). 

- f*.^ := n' /o-s/n, an estimator which will be correct in the one-to-one case if f^.^ a f since one has then f/n' - f'/n. 

For several-to-one simulations (Fig.[T^), /s o is by far the best estimator of / (just as /^'.^ would be the best estimator of /' for 
one-to-several simulations) and does not show any significant bias, whatever the value of n. While correct for small values of n, /o o 
and f*.^ diverge from / when n increases, so there is no point in using them. Estimator /o:s is always wrong. 

For one-to-one simulations (Fig. [TJi), /o:o diverges from / when n increases: it is therefore not a consistent estimator in the 
statistical sens^ which may seem surprising for a maximum likelihood estimator since the model on which it is based is correct 
by constr uction. H owever, all the demonstrations of consistency of maximum likelihood estimators we found in the literature (e.g. 
[Kendall & Stuart! (1979)) rest on the assumption that data are independent, i.e. that the overall likelihood is the product of the 
probabilities of each datum, which is wrong for Lo-o (cf. Eq. (|39]l). 

On the other hand, Ls:o is expressed as a product of terms (Eq. (fTSll). each related to a single /T-object. Although each of these 
terms involves sev eral ^'-so urces, this may explain why /s o and f*^, its equivalent for one-to-one simulations, do not show any bias 
in Fig[TJ). Ouoting fLe Card (11990.) . "Maximum likelihood estimates computed with all the information available may turn out to be 
inconsistent. Throwing away a substantial part of the information may render them consistent". . . 

/o:s behaves slightly better than for several-to-one simulations, but is still a very poor estimator of /. 



7.2. Simultaneous estimation of f and & 

How do different estimators of / and & behave when positional uncertainties are also unknown? We show in Fig. |2] the result of 
simulations with the same inputs as in Sect. 17. II except that « = «' = 2 x 10"*. The likelihood Ls:o peaks very close to the input value 
of X if, cr) for both types of simulations: ^s:o is therefore an unbiased estimator of x in this case too. On the other hand, .^0:0 is 
clearly wrong for several-to-one simulations; and even for one-to-one simulations, it performs worse than Xyo. 

To test the robustness of estimators of /, we ran simulations with the same parameters, but with elongated positional uncertainty 
ellipses: we took a,- = a 'j = 1.5 X 10"-^ rad and bi = b'j = fl,/3 for all (M,, M'j) eKxK'. These ellipses were randomly oriented, i.e. 
position angles (cf. App.lAl yS, and/3'y have uniform random values in [0, ?:[. We then estimated / without knowing these uncertainties 
(see Fig.O. Although the model from which the parameters are fitted is here inaccurate (cr is only a typical positional uncertainty 
on the relative positions of any couple (Mi, M'j)), the input value of / is still recovered by /s:o, but not by /o:o: /s:o is therefore a 
robust estimator too. 



7.3. Cinoice of model and recommendations 

Now, given two catalogs, which model should one choose to compute all the probabilities: several-to-one, one-to-one or one-to- 
several associations? One might think to adopt the one with the highest likelihood, but it fails as evidenced by Fig.H] for the range 

A consistent estimator is a statistic converging to the true value of a parameter when the size of the sample from which it is derived increases. 
Note that the concept of consistency is not so clear in the context of this paper, as there are two sample sizes, n and n' . 
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(a) „ (b) 





Fig. 2. Contour lines of Lso (solid) and Loo (dashed) in the if,cr) plane. Input parameters are the same as in Fig.[T] except that 
n - n' -IxKf"; the input value of (/, a) is indicated by a cross, (a) Several-to-one simulations, (b) One-to-one simulations. 

(b) 





Fig. 3. Contour lines of L^-o (solid) and Lo:o (dashed) in the (/, tr) plane. Input parameters are the same as in Fig.|2l except that 
positional uncertainty ellipses are elongated and randomly oriented (for details, see Sect. 17.21 2"*^ paragraph); the input value of / is 
indicated by a dotted line, (a) Several-to-one simulations, (b) One-to-one simulations. 



of parameters considered here, L^-o is always larger than Lo:o, even for one-to-one simulations (this however becomes wrong when 
n » n' and / ^ 1, in the same way as Lo,^ < Lo:o on Fig. |4] when n <K «')■ 

Since /s o is always larger than f*.^ for several-to-one simulations and both estimators are nearly equal for one-to-one simulations, 
we recommend the following procedure to compute the probabilities of association P(Aij \ C Pi C): 

- if/s:o/n' * fo;s/n,use 

- Ps:o (Eq. m) with / = f,.,o if /s:o/n' > fUn, and 

- /o:s (Eq. m) with /' = if /s:o/«' ^ 

- if fs:o/n' ~ /o' s/n, use Po.o (Eq. (l38Tl). As /s o and fo-^ are, respectively, good estimators of / and /' in the case of one-to-one 
simulations and as one must then have f/n' - f'/n, we suggest for the sake of symmetry to take / = /o*.o '■- (/s:o/o*s)'^^- 

If positional uncertainties are also unknown, robust estimates of & and v may be computed in the same way. 



8. Conclusion 

In this paper, we computed the probabilities of positional association of sources between two catalogs K and K' under two different 
assumptions: first, the easy case where several A'-objects may share the same /T' -counterpart; then, the more natural but numerically 
intensive case of one-to-one associations only between K and K'. 

These probabilities depend on at least one unknown parameter: the fraction of sources with a counterpart. If the positional 
uncertainties are unknown, other parameters are required to compute the probabilities. We calculated the likelihoods to observe 
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(a) (b) 




+ 
S 



I 

c 





— o L 


— ^s:o 






- o 






■ 


. L 


— ^o:s 












— -e-^ ^ 










- - ■ - - 





Fig. 4. Normalized mean value of different likelihoods at their maximum as a function of n. Simulations are the same as in Fig.[T] 
(a) Several-to-one simulations, (b) One-to-one simulations. 



all the K- and ^T'-sources at their registered positions under the two assumptions described above, and estimated the unknown 
parameters by maximizing these likelihoods. 

These relations were implemented in a code, Aspects, which we make public and with which we analyzed all-sky several-to- 
one and one-to-one simulations. After this study, it is clear that maximizing the several-to-one likelihood provides more reliable 
estimators of unknown parameters than maximizing the one-to-one likelihood (this does not mean that one should use the expression 
of Ps:o (Eq. (fT2l i) instead of that of Po:o (Eq. ( l48l l) to compute probabilities of association in the case of one-to-one associations!). We 
could also make recommendations on the choice of the source association model and on the best estimators of unknown parameters. 

In the simulations, we assumed that the density of K- and ^T'-sources was uniform on the sky area S: the quantities ^,,0 and ^oj 
used to compute the probabilities are then equal to 1 jS. If the density of objects is not uniform, one might take ^, = p{Mi)/n and 
^oj - p'(M'j)/n', where p and p' are, respectively, the local surface densities of K- and /T' -sources, but if the p'/p ratio varies on 
the sky, so will the fraction of sources with a counterpart - something which we did not try to model. Considering side effects or 
clustering, or taking into account priors on the SED of objects was also beyond the scope of this paper. 

In spite of these limitations. Aspects is a robust tool which should help astronomers to cross-identify astrophysical sources 
automatically, efficiently and reliably. 



Appendix A: Covariance matrix 

Let us first remind a few standard results. The probability that a ^-dimensional normally distributed random vector W of mean // 
falls in some domain Q is 

r expf-^lw-fiTg-r-g^ -[w-fxh) 



(27r)?/2(detrB)i/2 

where B := (u\, . . ., Ug) is a basis, Wb := (wi, . . . , WgY is the column vector in Z? of w -. "j> d^w'B \- dw/ and is the 

covariance matrix of W in B. We note this "Wb ~ Gg(fiB, Eb). 

In another basis B' := {u[, . . ., u'^), one has Wb = Tb^b' ■ Wb', where Tb^b' is the transformation matrix from B to B' (i.e. 
"y- -■ T!UiiTB^B')i,jUi). Since d%B = |det Tb^b-I d%B' and 

(W - ixfg ■ F^' ■ (W - h)b = (w - flfg, ■ (TglB' ' ' [TbIb'Y) ' ■ (w - fl)B', 

one still obtains 

r exp(-5 [w - fi]l, ■ F^,' ■ [w - fih') 

where F^' := Tg\^g, ■ Yb ■ (Tg^giY is the covariance matrix of W in B'. In the following, B and B' are orthonormal bases, so Tb-^b' 
is a rotation matrix. From T^^^, - Tg^g,, one gets F^- - T^^g, ■ Fg ■ Tb^b'- 

In a common basis, for independent random vectors Wi ~ Gqifii, Fi) and W2 ~ G^(/i2, F2), we have 

Wi + W2~G,(A'i±//2,Fi+F2). 

We now use these results to obtain the covariance matrix of vector rij :- r'j - ri, where r,- and r'j are, respectively, the observed 
positions of source M, of K and of its counterpart M'j in K' . We note and rf their true positions. One has 

rij^{r'j-rf) + (rf-r'i) + (r'l-ri). 
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We drop the subscript and the "prime" symbol in the following whenever an expression depends on either M, or Mj only. 

Let (Ujc, Uy, Mj) be a direct orthonormal basis, with oriented from the Earth's center O to the North Celestial Pole and from 
O to the Vernal Point. At a point M of right ascension a and declination 6, a direct orthonormal basis (Ur, Ua, ug) is defined by 

cos6 cosaUjr + cos 6 sinaM^, + sin^Mj., 

= -sinaUjc + cos a Uy, 

= -sin 5 cos a Mj: - sin <5 sin c m^, + cos 6 u^. 

The uncertainty ellipse on the position of M is characterized by the lengths a and b of the semi-major and semi-minor axes, 
and by the position angle /3 between the North and the semi-major axis. Let Ua and u/, be unit vectors directed respectively along 
the major and the minor axes, and such that {Ur,Ua,ui,) is a direct orthonormal basis and that /3 := (ug^a) is in [0, 7r[ when 
counted eastward. Since (Ua,ug) is obtained from {Ua,ui,) hy a (J3 - 7r/2)-counterclockwise rotation in the plane oriented by +Ur, 
T(u„,ut)^{u,„us) = RotOS - 7r/2), where 

Rot;r:=H^ 

^ \smx cosx 1 

Using notation 

Diag{dud2) := I^q 

for diagonal matri ces, one haj^ FfH. = Diag(a^, b^) and r(„^ = Rot^OS - 7T/2) ■ Diag(fl2, b^) . RotQ3 - n/l). 

As noticed bv iPineau et al.l (1201 ih . for nearby sources M,- and M'j close to the Poles, it is not always true that {Ua.,ug.) ^ 
iUa'.,ug'), SO one needs to define a common basis. We use the same basis as them, noted (t, n) below. While the results we get are 
intrinsically the same, some people may find our expressions more convenient. 

Denote tp :- {u^^r'j) G [0, tt] the angular distance between M,- and M'j, and n :- xUrr/Wur, xm^}!! a unit vector perpendicular 
to the plane (O, M,-, M'j). One has m^, • "r^ = cos i//, so 

if/ - arccos(cos 6i cos 6'j cos[Q;y - a,] + sin 6i sin 6'j), (A. 1) 

and ||Hr, X Mr' II = sini/r. 

Let 7, := (if^g.) and -y'j :- (n^gr) be angles oriented clockwise around -HMr, and +Ur'., respectively. Angle y, is fully determined 
by following expressions: 

cosy,- - n-Ug - (m^ X Mr-) ■ Ug - (m^ X Mr ) • Mr' = ■ Mr' 

sirnff ' smi/r ' ' sm^ ' ' 

COS 6'j sin(Q'^. - a,) 
sini/r 

Siny, = -n ■ Ua, = (Mr, X Mk) ■ U^. = (Mq., X Mr,) ■ Mr' = Ug. ■ Ur'. 

sini/r ^ sini/r -' sini/r ■' 

cos 6i sin 6'j - sin 6i cos 6'j cos(a'j - a,) 
sini/r 

Similarly, 

cos 6i sin(a' - a,) cos 61 sin 6': cos(a'i - a;) - sin 6: cos 6': 

cosy' and siny. = -. 

■' sini^ •' sini/r 

Let t :- n X u^: f is a unit vector tangent in M, to the minor arc of great circle going from M, to M'j. Project the sphere on 
the plane (M,, t, n) tangent to the sphere in M, (which specific projection does not matter since we consider only /^'-sources in the 
neighborhood of M,): one has rij ^ if/ 1, and the basis (t, n) is obtained from (Mq, mj) by a (/? + y - 7r/2)-counterclockwise rotation 
around +Ur, so, in {t, n), 

Fi = RotV/ + n - ^/2) ■ Diag(fl2, b^) . Rot(je,. + y,- - n/l) and L} = Rot'(/3'j + y'j - n/l) ■ Diag(af , b'j^) ■ Rotifi'j + y'j - n/2). 

As r,- - ~ G2(0, F,) and r'j - r'j* ~ G2(0, F^, one has rij ~ G2(0, Fij) if the true positions are identical, with F, ^ := F, -1- F^. 

If the positional uncertainty on M, is unknown, one may model it as F, - cr^ Diag(l, 1), with the same cr for all A'-sources, and 
derive an estimate of it := cr by maximizing the likelihood to observe the distribution of K- and ^''-sources (see Sects. [T2l and l4.2b . 
For a galaxy, however, the positional uncertainty on its center is likely to increase with its size. If the position angle 0, (counted 
eastward from the North) and the major and minor diameters D, and J, of the best-fitting ellipse of some isophote are known for 

' We seize this opportunity to correct equations (A.8) to (A. 11) of iPineau et alj ilOl Ih : a and b should be replaced by their squares in these 
formulae. 



OM 

\\OM\\ ~ 
dur/da 

\\dur/da\\ 
dur/d6 

\\duJd6\\ 
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Mj (for instance, pa rameters pa, D25 and d25 ■- D25/R25 taken from the RC3 catalog (Ide Vaucouleurs et"al ] ll991h or HyperLeda 
(IPaturel et al.ll2003h ). one may model the positional uncertainty as 

r,- = Rof(0i + ji - 7T/2) ■ Diag((r2 + [vDif, + [vdif) ■ Rot(0,- + y/ - nil) 
= 0-2 Diag(l, 1) + Rot"(6i,- + y,- - njl) ■ DiagCD^, d^) ■ Rot(6>; + y,- - njl), 

and derive estimates of o- := o- and v :— v from the lik elihood. Such a technique might indeed be used to estimate the accuracy of 
coordinates in some catalog (see IPaturel & Petil (Il999l) for another method). 
If the positional uncertainty on M'j is also unknown, one can put 

r} = cr'2 Diag(l, 1) + v'2 Rot'(6l} + y) - nil) ■ DiagCD^, d^) ■ Rot(6'j + y'j - n/1) 

with the same cr' and v' for all /T'-sources. As y'j + O'j - y/ + 0,-, only estimates of & :- {cr'' + o-'^)'^^ ^jjjj ^ ■- (^y2 + y'^y/^ may be 
obtainec0 from the likelihood, not of cr, cr' , v or v' themselves. 

A similar technique can be applied if the true centers of a source in K and of its counterpart in K' may differ. This might 
be in particular useful when associating galaxies from an optical catalog and from a ultraviolet or far-infrared catalog, because, 
while the optical is dominated by smoothly-distributed evolved stellar populations, the ultraviolet and the far-infrared mainly trace 
star-forming regions. Observations of galaxies (e.g. Kuchinski et al. (2000)) have indeed shown that galaxies are very patchy in the 
ultraviolet, and the same has been observed in the far-infrared. As the angular distance between the true centers should increase with 
the size of the galaxy, one may model this as r^"-r? ~ G2(0, r°^.), where r°^. - VQRot''(6i,-i-y,-7r/2)-Diag(Dj,(ij)-Rot(6',H-y,-7r/2). 

In the most general case, 

rtj ~ G2(0,rij), 

with Fij :- r, + F'j + j. Once again, if cr, cr', v, v' and vo are unknown, quantities cr := {cr^ + cr'^Y^^ and v := (v^ -1- v'^ + v^)'^^ 
may be estimated as indicated in Sects. I3^ and l4. 21 



Appendix B: One-to-one case computations for n > n' 

Although one always can define K and K' in the one-to-one case so that n < n', it may be interesting to treat the case where n> n' 
to check the consistency of numerical calculations. 

If « > n', n and / are replaced by n' and /' in Eq. (|35] |. Eq. ( |36] | becomes 

ji=0 >2=0 j„=Q • k=l ji=0 j2=Q j„=Q k=l 

j\iJo jiiJi j„iJ„-\ j\iJo jiiJi j„iJ„-i 

where 77^ „ := (1 -f')^k,o and 77^ ^.^ f ^k,jj(n - #Jk-i) if jk > 0. 
Similarly, 



II II n' n' n 

p.o(A,,,ncic') = ^(i-/')"'-"^i,X--- Z Z •■■ Z He.' 



j\=0 j,-\=0 /;+i=0 i„=0 k=l 

where Cuo-i^' /')^/,o, Cij f ^Ujln = f^ij/n' if j > 0, t/^^q = (1 - /')f*,o and rj'^j^ := /' ,,/(« - #/;_,) if;, > 0. 
The practical expression of the likelihood (cf. Eq. (l47l l) is given by 

u.,^ = (1 -/')"'-"(n ^-^ n — )fl^o,„ 

Vj PoAAiM I C n C n nr=i Ak.o] 1 )J 

where Po.oiAi^o \ C CiC n pljl'i ^t.o) is computed as detailed in Sect. 15.2.21 from Eq. ( |46] |. with ^ replaced by (' and with /' and n 
instead of / and n' in the expressions of 77'^ and 77*'*. 
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* However, as noticed by Ide Vaucouleurs & Head il978h in a different context, if three samples with unknown uncertainties cr, (i e [1, 3J) are 
availabl e and if the cr, ■- (cr } + crj^'^ may be estimated for all the pairs (i, e |[1, 31", as in our case, then cr, may be determined for each 
sample. IPaturel & PetitI ( IT9991) used this technique to compute the accuracy of galaxy coordinates. 



17 



M. Fioc: Probabilistic positional association of catalogs of astrophysical sources: the Aspects code 

References 

Budaviri, T. & Szalay, A. S. 2008, ApJ, 679, 301 

Condon, J. J., Balonek, T. J., & Jauncey, D. L. 1975, AJ, 80, 887 

de Ruiter, H. R., Arp, H. C, & Willis, A. G. 1977, A&AS, 28, 211 

de Vaucouleurs, G., de Vaucouleurs, A., Corwin, Jr., H. G., et al. 1991, Third Reference Catalogue of Bright Galaxies, ed. de Vaucouleurs, G., de Vaucouleurs, A., 

Corwin, H. G., Jr., Buta, R. J., Paturel, G., & Fouqu^, R 
de Vaucouleurs, G. & Head, C. 1978, ApJS, 36, 439 

Kendall, M. & Stuart, A. 1979, The advanced theory of statistics. Vol.2: Inference and relationship, ed. Kendall, M. & Stuart, A. 
Kuchinski, L. E., Freedman, W. L., Madore, B. P., et al. 2000, ApJS, 131, 441 
Le Cam, L. 1990, Internat. Statist. Rev., 58, 153 

Moshir, M., Copan, G., Conrow, T., et al. 1993, VizieR Online Data Catalog, 2156, 

Moshir, M., Kopman, G., & Conrow, T. A. O. 1992, IRAS Faint Source Survey, Explanatory supplement version 2, ed. Moshir, M., Kopman, G., & Conrow, T. A. O. 
Paturel, G., Bottinelli, L., & Gouguenheim, L. 1995, Astrophysical Letters and Communications, 31, 13 
Paturel, G. & Petit, C. 1999, A&A, 352, 431 

Paturel, G., Petit, C, Prugniel, P, et al. 2003, VizieR Online Data Catalog, 7237, 
Pmeau, F.-X., Motch, C, Carrera, F., et al. 2011, A&A, 527, A126 

Press, W. H., Teukolsky, S. A., VetterUng, W. T., & Flannery, B. P. 1992, Numerical recipes in Fortran. The art of scientific computing 
Prestage, R. M. & Peacock, J. A. 1983, MNRAS, 204, 355 

Rutledge, R. E., Brunner, R. J., Prince, T. A., & Lonsdale, C. 2000, ApJS, 131, 335 
Sutherland, W. & Saunders, W. 1992, MNRAS, 259, 413 
Wolfram, S. 1996, The Mathematica book, ed. Wolfram, S. 



18 



